Efficient and scalable parametric stereo coding for low bitrate audio coding applications

ABSTRACT

The present invention provides improvements to prior art audio codecs that generate a stereo-illusion through post-processing of a received mono signal. These improvements are accomplished by extraction of stereo-image describing parameters at the encoder side, which are transmitted and subsequently used for control of a stereo generator at the decoder side. Furthermore, the invention bridges the gap between simple pseudo-stereo methods, and current methods of true stereo-coding, by using a new form of parametric stereo coding. A stereo-balance parameter is introduced, which enables more advanced stereo modes, and in addition forms the basis of a new method of stereo-coding of spectral envelopes, of particular use in systems where guided HFR (High Frequency Reconstruction) is employed. As a special case, the application of this stereo-coding scheme in scalable HFR-based codecs is described.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/744,586 filed on Jan. 16, 2020, which is a continuation of U.S.patent application Ser. No. 16/399,705 filed on Apr. 30, 2019, whichissued on Jan. 21, 2020 as U.S. Pat. No. 10,540,982, which is acontinuation of U.S. patent application Ser. No. 16/157,899 filed onOct. 11, 2018, which issued on May 21, 2019 as U.S. Pat. No. 10,297,261,which is a continuation of U.S. patent application Ser. No. 14/078,456filed on Nov. 12, 2013, which is now abandoned, which is a continuationof U.S. patent application Ser. No. 12/610,186 filed on Oct. 30, 2009,which issued on Dec. 10, 2013 as U.S. Pat. No. 8,605,911, which is adivisional of U.S. patent application Ser. No. 11/238,982 filed on Sep.28, 2005, which issued on Feb. 14, 2012 as U.S. Pat. No. 8,116,460,which is a divisional of U.S. patent application Ser. No. 10/483,453filed on Jan. 8, 2004, which issued on Jun. 3, 2008 as U.S. Pat. No.7,382,886, which claims priority to PCT/SE02/01372, filed Jul. 10, 2002,which claims priority to Swedish Application Serial No. 0102481-9, filedJul. 10, 2001, Swedish Application Serial No. 0200796-1, filed Mar. 15,2002, and Swedish Application Serial No. 0202159-0, filed Jul. 9, 2002,each of which is herein incorporated by reference.

BACKGROUND OF THE INVENTION Technical Field

The present invention relates to low bitrate audio source codingsystems. Different parametric representations of stereo properties of aninput signal are introduced, and the application thereof at the decoderside is explained, ranging from pseudo-stereo to full stereo coding ofspectral envelopes, the latter of which is especially suited for HFRbased codecs.

Description of the Related Art

Audio source coding techniques can be divided into two classes: naturalaudio coding and speech coding. At medium to high bitrates, naturalaudio coding is commonly used for speech and music signals, and stereotransmission and reproduction is possible. In applications where onlylow bitrates are available, e.g. Internet streaming audio targeted atusers with slow telephone modem connections, or in the emerging digitalAM broadcasting systems, mono coding of the audio program material isunavoidable. However, a stereo impression is still desirable, inparticular when listening with headphones, in which case a pure monosignal is perceived as originating from “within the head”, which can bean unpleasant experience.

One approach to address this problem is to synthesize a stereo signal atthe decoder side from a received pure mono signal. Throughout the years,several different “pseudo-stereo” generators have been proposed. Forexample in [U.S. Pat. No. 5,883,962], enhancement of mono signals bymeans of adding delayed/phase shifted versions of a signal to theunprocessed signal, thereby creating a stereo illusion, is described.Hereby the processed signal is added to the original signal for each ofthe two outputs at equal levels but with opposite signs, ensuring thatthe enhancement signals cancel if the two channels are added later on inthe signal path. In [PCT WO 98/57436] a similar system is shown, albeitwithout the above mono-compatibility of the enhanced signal. Prior artmethods have in common that they are applied as pure post-processes. Inother words, no information on the degree of stereo-width, let aloneposition in the stereo sound stage, is available to the decoder. Thus,the pseudo-stereo signal may or may not have a resemblance of the stereocharacter of the original signal. A particular situation where prior artsystems fall short, is when the original signal is a pure mono signal,which often is the case for speech recordings. This mono signal isblindly converted to a synthetic stereo signal at the decoder, which inthe speech case often causes annoying artifacts, and may reduce theclarity and speech intelligibility.

Other prior art systems, aiming at true stereo transmission at lowbitrates, typically employ a sum and difference coding scheme. Thus, theoriginal left (L) and right (R) signals are converted to a sum signal,S=(L+R)/2, and a difference signal, D=(L−R)/2, and subsequently encodedand transmitted. The receiver decodes the S and D signals, whereupon theoriginal L/R-signal is recreated through the operations L=S+D, andR=S−D. The advantage of this, is that very often a redundancy between Land R is at hand, whereby the information in D to be encoded is less,requiring fewer bits, than in S. Clearly, the extreme case is a puremono signal, i.e. L and R are identical. A traditional L/R-codec encodesthis mono signal twice, whereas a S/D codec detects this redundancy, andthe D signal does (ideally) not require any bits at all. Another extremeis represented by the situation where R=−L, corresponding to “out ofphase” signals. Now, the S signal is zero, whereas the D signal computesto L. Again, the S/D-scheme has a clear advantage to standardL/R-coding. However, consider the situation where e.g. R=0 during apassage, which was not uncommon in the early days of stereo recordings.Both S and D equal L/2, and the S/D-scheme does not offer any advantage.On the contrary, L/R-coding handles this very well: The R signal doesnot require any bits. For this reason, prior art codecs employ adaptiveswitching between those two coding schemes, depending on what methodthat is most beneficial to use at a given moment. The above examples aremerely theoretical (except for the dual mono case, which is common inspeech only programs). Thus, real world stereo program material containssignificant amounts of stereo information, and even if the aboveswitching is implemented, the resulting bitrate is often still too highfor many applications. Furthermore, as can be seen from the resynthesisrelations above, very coarse quantization of the D signal in an attemptto further reduce the bitrate is not feasible, since the quantizationerrors translate to non-neglectable level errors in the L and R signals.

SUMMARY OF THE INVENTION

The present invention employs detection of signal stereo propertiesprior to coding and transmission. In the simplest form, a detectormeasures the amount of stereo perspective that is present in the inputstereo signal. This amount is then transmitted as a stereo widthparameter, together with an encoded mono sum of the original signal. Thereceiver decodes the mono signal, and applies the proper amount ofstereo-width, using a pseudo-stereo generator, which is controlled bysaid parameter. As a special case, a mono input signal is signaled aszero stereo width, and correspondingly no stereo synthesis is applied inthe decoder. According to the invention, useful measures of thestereo-width can be derived e.g. from the difference signal or from thecross-correlation of the original left and right channel. The value ofsuch computations can be mapped to a small number of states, which aretransmitted at an appropriate fixed rate in time, or on an as-neededbasis. The invention also teaches how to filter the synthesized stereocomponents, in order to reduce the risk of unmasking coding artifactswhich typically are associated with low bitrate coded signals.

Alternatively, the overall stereo-balance or localization in the stereofield is detected in the encoder. This information, optionally togetherwith the above width-parameter, is efficiently transmitted as abalance-parameter, along with the encoded mono signal. Thus,displacements to either side of the sound stage can be recreated at thedecoder, by correspondingly altering the gains of the two outputchannels. According to the invention, this stereo-balance parameter canbe derived from the quotient of the left and right signal powers. Thetransmission of both types of parameters requires very few bits comparedto full stereo coding, whereby the total bitrate demand is kept low. Ina more elaborate version of the invention, which offers a more accurateparametric stereo depiction, several balance and stereo-width parametersare used, each one representing separate frequency bands.

The balance-parameter generalized to a per frequency-band operation,together with a corresponding per band operation of a level-parameter,calculated as the sum of the left and right signal powers, enables anew, arbitrary detailed, representation of the power spectral density ofa stereo signal. A particular benefit of this representation, inaddition to the benefits from stereo redundancy that also S/D-systemstake advantage of, is that the balance-signal can be quantized with lessprecision than the level ditto, since the quantization error, whenconverting back to a stereo spectral envelope, causes an “error inspace”, i.e. perceived localization in the stereo panorama, rather thanan error in level. Analogous to a traditional switched L/R- andS/D-system, the level/balance-scheme can be adaptively switched off, infavor of a levelL/levelR-signal, which is more efficient when theoverall signal is heavily offset towards either channel. The abovespectral envelope coding scheme can be used whenever an efficient codingof power spectral envelopes is required, and can be incorporated as atool in new stereo source codecs. A particularly interesting applicationis in HFR systems that are guided by information about the originalsignal highband envelope. In such a system, the lowband is coded anddecoded by means of an arbitrary codec, and the highband is regeneratedat the decoder using the decoded lowband signal and the transmittedhighband envelope information [PCT WO 98/57436]. Furthermore, thepossibility to build a scalable HFR-based stereo codec is offered, bylocking the envelope coding to level/balance operation. Hereby the levelvalues are fed into the primary bitstream, which, depending on theimplementation, typically decodes to a mono signal. The balance valuesare fed into the secondary bitstream, which in addition to the primarybitstream is available to receivers close to the transmitter, taking anIBOC (In-Band On-Channel) digital AM-broadcasting system as an example.When the two bitstreams are combined, the decoder produces a stereooutput signal. In addition to the level values, the primary bitstreamcan contain stereo parameters, e.g. a width parameter. Thus, decoding ofthis bitstream alone already yields a stereo output, which is improvedwhen both bitstreams are available.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described by way of illustrativeexamples, not limiting the scope or spirit of the invention, withreference to the accompanying drawings, in which:

FIG. 1 illustrates a source coding system containing an encoder enhancedby a parametric stereo encoder module, and a decoder enhanced by aparametric stereo decoder module.

FIG. 2a is a block schematic of a parametric stereo decoder module,

FIG. 2b is a block schematic of a pseudo-stereo generator with controlparameter inputs,

FIG. 2c is a block schematic of a balance adjuster with controlparameter inputs,

FIG. 3 is a block schematic of a parametric stereo decoder module usingmultiband pseudo-stereo generation combined with multiband balanceadjustment,

FIG. 4a is a block schematic of the encoder side of a scalable HFR-basedstereo codec, employing level/balance-coding of the spectral envelope,

FIG. 4b is a block schematic of the corresponding decoder side.

DESCRIPTION OF PREFERRED EMBODIMENTS

The below-described embodiments are merely illustrative for theprinciples of the present invention. It is understood that modificationsand variations of the arrangements and the details described herein willbe apparent to others skilled in the art. It is the intent therefore, tobe limited only by the scope of the impending patent claims, and not bythe specific details presented by way of description and explanation ofthe embodiments herein. For the sake of clarity, all below examplesassume two channel systems, but apparent to others skilled in the art,the methods can be applied to multichannel systems, such as a 5.1system.

FIG. 1 shows how an arbitrary source coding system comprising of anencoder, 107, and a decoder, 115, where encoder and decoder operate inmonaural mode, can be enhanced by parametric stereo coding according tothe invention. Let L and R denote the left and right analog inputsignals, which are fed to an AD-converter, 101. The output from theAD-converter is converted to mono, 105, and the mono signal is encoded,107. In addition, the stereo signal is routed to a parametric stereoencoder, 103, which calculates one or several stereo parameters to bedescribed below. Those parameters are combined with the encoded monosignal by means of a multiplexer, 109, forming a bitstream, 111. Thebitstream is stored or transmitted, and subsequently extracted at thedecoder side by means of a demultiplexer, 113. The mono signal isdecoded, 115, and converted to a stereo signal by a parametric stereodecoder, 119, which uses the stereo parameter(s), 117, as controlsignal(s). Finally, the stereo signal is routed to the DA-converter,121, which feeds the analog outputs, L′ and R′. The topology accordingto FIG. 1 is common to a set of parametric stereo coding methods whichwill be described in detail, starting with the less complex versions.

One method of parameterization of stereo properties according to thepresent invention, is to determine the original signal stereo-width atthe encoder side. A first approximation of the stereo-width is thedifference signal, D=L−R, since, roughly put, a high degree ofsimilarity between L and R computes to a small value of D, and viceversa. A special case is dual mono, where L=R and thus D=0. Thus, eventhis simple algorithm is capable of detecting the type of mono inputsignal commonly associated with news broadcasts, in which casepseudo-stereo is not desired. However, a mono signal that is fed to Land R at different levels does not yield a zero D signal, even thoughthe perceived width is zero. Thus, in practice more elaborate detectorsmight be required, employing for example cross-correlation methods. Oneshould make sure that the value describing the left-right difference orcorrelation in some way is normalized with the total signal level, inorder to achieve a level independent detector. A problem with theaforementioned detector is the case when mono speech is mixed with amuch weaker stereo signal e.g. stereo noise or background music duringspeech-to-music/music-to-speech transitions. At the speech pauses thedetector will then indicate a wide stereo signal. This is solved bynormalizing the stereo-width value with a signal containing informationof previous total energy level e.g., a peak decay signal of the totalenergy. Furthermore, to prevent the stereo-width detector from beingtrigged by high frequency noise or channel different high frequencydistortion, the detector signals should be pre-filtered by a low-passfilter, typically with a cutoff frequency somewhere above a voice'ssecond formant, and optionally also by a high-pass filter to avoidunbalanced signal-offsets or hum. Regardless of detector type, thecalculated stereo-width is mapped to a finite set of values, coveringthe entire range, from mono to wide stereo.

FIG. 2a gives an example of the contents of the parametric stereodecoder introduced in FIG. 1. The block denoted ‘balance’, 211,controlled by parameter B, will be described later, and should beregarded as bypassed for now. The block denoted ‘width’, 205, takes amono input signal, and synthetically recreates the impression of stereowidth, where the amount of width is controlled by the parameter W. Theoptional parameters S and D will be described later. According to theinvention, a subjectively better sound quality can often be achieved byincorporating a crossover filter comprising of a low-pass filter, 203,and a high-pass filter, 201, in order to keep the low frequency range“tight” and unaffected. Hereby only the output from the high-pass filteris routed to the width block. The stereo output from the width block isadded to the mono output from the low-pass filter by means of 207 and209, forming the stereo output signal.

Any prior art pseudo-stereo generator can be used for the width block,such as those mentioned in the background section, or a Schroeder-typeearly reflection simulating unit (multitap delay) or reverberator. FIG.2b gives an example of a pseudo-stereo generator, fed by a mono signalM. The amount of stereo-width is determined by the gain of 215, and thisgain is a function of the stereo-width parameter, W. The higher thegain, the wider the stereo-impression, a zero gain corresponds to puremono reproduction. The output from 215 is delayed, 221, and added, 223and 225, to the two direct signal instances, using opposite signs. Inorder not to significantly alter the overall reproduction level whenchanging the stereo-width, a compensating attenuation of the directsignal can be incorporated, 213. For example, if the gain of the delayedsignal is G, the gain of the direct signal can be selected assqrt(1−G²). According to the invention, a high frequency roll-off can beincorporated in the delay signal path, 217, which helps avoidingpseudo-stereo caused unmasking of coding artifacts. Optionally,crossover filter, roll-off filter and delay parameters can be sent inthe bitstream, offering more possibilities to mimic the stereoproperties of the original signal, as also shown in FIGS. 2a and 2b asthe signals X, S and D. If a reverberation unit is used for generating astereo signal, the reverberation decay might sometimes be unwanted afterthe very end of a sound. These unwanted reverb-tails can however easilybe attenuated or completely removed by just altering the gain of thereverb signal. A detector designed for finding sound endings can be usedfor that purpose. If the reverberation unit generates artifacts at somespecific signals e.g., transients, a detector for those signals can alsobe used for attenuating the same.

An alternative method of detecting stereo-properties according to theinvention, is described as follows. Again, let L and R denote the leftand right input signals. The corresponding signal powers are then givenby P_(L)˜L² and P_(R)˜R². Now, a measure of the stereo-balance can becalculated as the quotient of the two signal powers, or morespecifically as B=(P_(L)+e)/(P_(R)+e), where e is an arbitrary, verysmall number, which eliminates division by zero. The balance parameter,B, can be expressed in dB given by the relation B_(dB)==10 log₁₀(B). Asan example, the three cases P_(L)=10P_(R), P_(L)=P_(R), andP_(L)=0.1P_(R) correspond to balance values of +10 dB, 0 dB, and −10 dBrespectively. Clearly, those values map to the locations “left”,“center”, and “right”. Experiments have shown that the span of thebalance parameter can be limited to for example +/−40 dB, since thoseextreme values are already perceived as if the sound originates entirelyfrom one of the two loudspeakers or headphone drivers. This limitationreduces the signal space to cover in the transmission, thus offeringbitrate reduction. Furthermore, a progressive quantization scheme can beused, whereby smaller quantization steps are used around zero, andlarger steps towards the outer limits, which further reduces thebitrate. Often the balance is constant over time for extended passages.Thus, a last step to significantly reduce the number of average bitsneeded can be taken: After transmission of an initial balance value,only the differences between consecutive balance values are transmitted,whereby entropy coding is employed. Very commonly, this difference iszero, which thus is signaled by the shortest possible codeword. Clearly,in applications where bit errors are possible, this delta coding must bereset at an appropriate time interval, in order to eliminateuncontrolled error propagation.

The most rudimental decoder usage of the balance parameter, is simply tooffset the mono signal towards either of the two reproduction channels,by feeding the mono signal to both outputs and adjusting the gainscorrespondingly, as illustrated in FIG. 2c , blocks 227 and 229, withthe control signal B. This is analogous to turning the “panorama” knobon a mixing desk, synthetically “moving” a mono signal between the twostereo speakers.

The balance parameter can be sent in addition to the above describedwidth parameter, offering the possibility to both position and spreadthe sound image in the sound-stage in a controlled manner, offeringflexibility when mimicking the original stereo impression. One problemwith combining pseudo stereo generation, as mentioned in a previoussection, and parameter controlled balance, is unwanted signalcontribution from the pseudo stereo generator at balance positions farfrom center position. This is solved by applying a mono favoringfunction on the stereo-width value, resulting in a greater attenuationof the stereo-width value at balance positions at extreme side positionand less or no attenuation at balance positions close to the centerposition.

The methods described so far, are intended for very low bitrateapplications. In applications where higher bitrates are available, it ispossible to use more elaborate versions of the above width and balancemethods. Stereo-width detection can be made in several frequency bands,resulting in individual stereo-width values for each frequency band.Similarly, balance calculation can operate in a multiband fashion, whichis equivalent to applying different filter-curves to two channels thatare fed by a mono signal. FIG. 3 shows an example of a parametric stereodecoder using a set of N pseudo-stereo generators according to FIG. 2b ,represented by blocks 307, 317 and 327, combined with multiband balanceadjustment, represented by blocks 309, 319 and 329, as described in FIG.2c . The individual passbands are obtained by feeding the mono inputsignal, M, to a set of bandpass filters, 305, 315 and 325. The bandpassstereo outputs from the balance adjusters are added, 311, 321, 313, 323,forming the stereo output signal, L and R. The formerly scalar width-and balance parameters are now replaced by the arrays W(k) and B(k). InFIG. 3, every pseudo-stereo generator and balance adjuster has uniquestereo parameters. However, in order to reduce the total amount of datato be transmitted or stored, parameters from several frequency bands canbe averaged in groups at the encoder, and this smaller number ofparameters be mapped to the corresponding groups of width and balanceblocks at the decoder. Clearly, different grouping schemes and lengthscan be used for the arrays W(k) and B(k). S(k) represents the gains ofthe delay signal paths in the width blocks, and D(k) represents thedelay parameters. Again, S(k) and D(k) are optional in the bitstream.

The parametric balance coding method can, especially for lower frequencybands, give a somewhat unstable behavior, due to lack of frequencyresolution, or due to too many sound events occurring in one frequencyband at the same time but at different balance positions. Thosebalance-glitches are usually characterized by a deviant balance valueduring just a short period of time, typically one or a few consecutivevalues calculated, dependent on the update rate. In order to avoiddisturbing balance-glitches, a stabilization process can be applied onthe balance data. This process may use a number of balance values beforeand after current time position, to calculate the median value of those.The median value can subsequently be used as a limiter value for thecurrent balance value i.e., the current balance value should not beallowed to go beyond the median value. The current value is then limitedby the range between the last value and the median value. Optionally,the current balance value can be allowed to pass the limited values by acertain overshoot factor. Furthermore, the overshoot factor, as well asthe number of balance values used for calculating the median, should beseen as frequency dependent properties and hence be individual for eachfrequency band.

At low update ratios of the balance information, the lack of timeresolution can cause failure in synchronization between motions of thestereo image and the actual sound events. To improve this behavior interms of synchronization, an interpolation scheme based on identifyingsound events can be used. Interpolation here refers to interpolationsbetween two, in time consecutive balance values. By studying the monosignal at the receiver side, information about beginnings and ends ofdifferent sound events can be obtained. One way is to detect a suddenincrease or decrease of signal energy in a particular frequency band.The interpolation should after guidance from that energy envelope intime make sure that the changes in balance position should be performedpreferably during time segments containing little signal energy. Sincehuman ear is more sensitive to entries than trailing parts of a sound,the interpolation scheme benefits from finding the beginning of a soundby e.g., applying peak-hold to the energy and then let the balance valueincrements be a function of the peak-holded energy, where a small energyvalue gives a large increment and vice versa. For time segmentscontaining uniformly distributed energy in time i.e., as for somestationary signals, this interpolation method equals linearinterpolation between the two balance values. If the balance values arequotients of left and right energies, logarithmic balance values arepreferred, for left-right symmetry reasons. Another advantage ofapplying the whole interpolation algorithm in the logarithmic domain isthe human ear's tendency of relating levels to a logarithmic scale.

Also, for low update ratios of the stereo-width gain values,interpolation can be applied to the same. A simple way is to interpolatelinearly between two in time consecutive stereo-width values. Morestable behavior of the stereo-width can be achieved by smoothing thestereo-width gain values over a longer time segment containing severalstereo-width parameters. By utilizing smoothing with different attackand release time constants, a system well suited for program materialcontaining mixed or interleaved speech and music is achieved. Anappropriate design of such smoothing filter is made using a short attacktime constant, to get a short rise-time and hence an immediate responseto music entries in stereo, and a long release time, to get a longfall-time. To be able to fast switch from a wide stereo mode to mono,which can be desirable for sudden speech entries, there is a possibilityto bypass or reset the smoothing filter by signaling this event.Furthermore, attack time constants, release time constants and othersmoothing filter characteristics can also be signaled by an encoder.

For signals containing masked distortion from a psycho-acoustical codec,one common problem with introducing stereo information based on thecoded mono signal is an unmasking effect of the distortion. Thisphenomenon usually referred as “stereo-unmasking” is the result ofnon-centered sounds that do not fulfill the masking criterion. Theproblem with stereo-unmasking might be solved or partly solved by, atthe decoder side, introducing a detector aimed for such situations.Known technologies for measuring signal to mask ratios can be used todetect potential stereo-unmasking. Once detected, it can be explicitlysignaled or the stereo parameters can just simply be decreased.

At the encoder side, one option, as taught by the invention, is toemploy a Hilbert transformer to the input signal, i.e. a 90 degree phaseshift between the two channels is introduced. When subsequently formingthe mono signal by addition of the two signals, a better balance betweena center-panned mono signal and “true” stereo signals is achieved, sincethe Hilbert transformation introduces a 3 dB attenuation for centerinformation. In practice, this improves mono coding of e.g. contemporarypop music, where for instance the lead vocals and the bass guitarcommonly is recorded using a single mono source.

The multiband balance-parameter method is not limited to the type ofapplication described in FIG. 1. It can be advantageously used wheneverthe objective is to efficiently encode the power spectral envelope of astereo signal. Thus, it can be used as tool in stereo codecs, where inaddition to the stereo spectral envelope a corresponding stereo residualis coded. Let the total power P, be defined by P=P_(L)+P_(R), whereP_(L) and P_(R) are signal powers as described above. Note that thisdefinition does not take left to right phase relations into account.(E.g. identical left and right signals but of opposite signs, does notyield a zero total power.) Analogous to B, P can be expressed in dB asP_(dB)=10 log₁₀(P/P_(ref)), where P_(ref) is an arbitrary referencepower, and the delta values be entropy coded. As opposed to the balancecase, no progressive quantization is employed for P. In order torepresent the spectral envelope of a stereo signal, P and B arecalculated for a set of frequency bands, typically, but not necessarily,with bandwidths that are related to the critical bands of human hearing.For example those bands may be formed by grouping of channels in aconstant bandwidth filterbank, whereby P_(L) and P_(R) are calculated asthe time and frequency averages of the squares of the subband samplescorresponding to respective band and period in time. The sets P₀, P₁,P₂, . . . , P_(N-1) and B₀, B₁, B₂, . . . , B_(N-1), where thesubscripts denote the frequency band in an N band representation, aredelta and Huffman coded, transmitted or stored, and finally decoded intothe quantized values that were calculated in the encoder. The last stepis to convert P and B back to P_(L) and P_(R). As easily seen form thedefinitions of P and B, the reverse relations are (when neglecting e inthe definition of B) P_(L)=BP/(B+1), and P_(R)=P/(B+1).

One particularly interesting application of the above envelope codingmethod is coding of highband spectral envelopes for HFR-based codecs. Inthis case no highband residual signal is transmitted. Instead thisresidual is derived from the lowband. Thus, there is no strict relationbetween residual and envelope representation, and envelope quantizationis more crucial. In order to study the effects of quantization, let Pqand Bq denote the quantized values of P and B respectively. Pq and Bqare then inserted into the above relations, and the sum is formed: P_(L)q+P_(R) q=BqPq/(Bq+1)+Pq/(Bq+1)=Pq(Bq+1)/(Bq+1)=Pq. The interestingfeature here is that Bq is eliminated, and the error in total power issolely determined by the quantization error in P. This implies that eventhough B is heavily quantized, the perceived level is correct, assumingthat sufficient precision in the quantization of P is used. In otherwords, distortion in B maps to distortion in space, rather than inlevel. As long as the sound sources are stationary in the space overtime, this distortion in the stereo perspective is also stationary, andhard to notice. As already stated, the quantization of thestereo-balance can also be coarser towards the outer extremes, since agiven error in dB corresponds to a smaller error in perceived angle whenthe angle to the centerline is large, due to properties of humanhearing.

When quantizing frequency dependent data e.g., multi band stereo-widthgain values or multi band balance values, resolution and range of thequantization method can advantageously be selected to match theproperties of a perceptual scale. If such scale is made frequencydependent, different quantization methods, or so called quantizationclasses, can be chosen for the different frequency bands. The encodedparameter values representing the different frequency bands, should thenin some cases, even if having identical values, be interpreted indifferent ways i.e., be decoded into different values.

Analogous to a switched L/R- to S/D-coding scheme, the P and B signalsmay be adaptively substituted by the P_(L) and P_(R) signals, in orderto better cope with extreme signals. As taught by [PCT/SE00/00158],delta coding of envelope samples can be switched from delta-in-time todelta-in-frequency, depending on what direction is most efficient interms of number of bits at a particular moment. The balance parametercan also take advantage of this scheme: Consider for example a sourcethat moves in stereo field over time. Clearly, this corresponds to asuccessive change of balance values over time, which depending on thespeed of the source versus the update rate of the parameters, maycorrespond to large delta-in-time values, corresponding to largecodewords when employing entropy coding. However, assuming that thesource has uniform sound radiation versus frequency, thedelta-in-frequency values of the balance parameter are zero at everypoint in time, again corresponding to small codewords. Thus, a lowerbitrate is achieved in this case, when using the frequency delta codingdirection. Another example is a source that is stationary in the room,but has a non-uniform radiation. Now the delta-in-frequency values arelarge, and delta-in-time is the preferred choice.

The PB-coding scheme offers the possibility to build a scalableHFR-codec, see FIG. 4. A scalable codec is characterized in that thebitstream is split into two or more parts, where the reception anddecoding of higher order parts is optional. The example assumes twobitstream parts, hereinafter referred to as primary, 419, and secondary,417, but extension to a higher number of parts is clearly possible. Theencoder side, FIG. 4a , comprises of an arbitrary stereo lowbandencoder, 403, which operates on the stereo input signal, IN (the trivialsteps of AD-respective DA-conversion are not shown in the figure), aparametric stereo encoder, which estimates the highband spectralenvelope, and optionally additional stereo parameters, 401, which alsooperates on the stereo input signal, and two multiplexers, 415 and 413,for the primary and secondary bitstreams respectively. In thisapplication, the highband envelope coding is locked to P/B-operation,and the P signal, 407, is sent to the primary bitstream by means of 415,whereas the B signal, 405, is sent to the secondary bitstream, by meansof 413.

For the lowband codec different possibilities exist: It may constantlyoperate in S/D-mode, and the S and D signals be sent to primary andsecondary bitstreams respectively. In this case, a decoding of theprimary bitstream results in a full band mono signal. Of course, thismono signal can be enhanced by parametric stereo methods according tothe invention, in which case the stereo-parameter(s) also must belocated in the primary bitstream. Another possibility is to feed astereo coded lowband signal to the primary bitstream, optionallytogether with highband width- and balance-parameters. Now decoding ofthe primary bitstream results in true stereo for the lowband, and veryrealistic pseudo-stereo for the highband, since the stereo properties ofthe lowband are reflected in the high frequency reconstruction. Statedin another way: Even though the available highband enveloperepresentation or spectral coarse structure is in mono, the synthesizedhighband residual or spectral fine structure is not. In this type ofimplementation, the secondary bitstream may contain more lowbandinformation, which when combined with that of the primary bitstream,yields a higher quality lowband reproduction. The topology of FIG. 4illustrates both cases, since the primary and secondary lowband encoderoutput signals, 411, and 409, connected to 415 and 417 respectively, maycontain either of the above described signal types.

The bitstreams are transmitted or stored, and either only 419 or both419 and 417 are fed to the decoder, FIG. 4b . The primary bitstream isdemultiplexed by 423, into the lowband core decoder primary signal, 429and the P signal, 431. Similarly, the secondary bitstream isdemultiplexed by 421, into the lowband core decoder secondary signal,427, and the B signal, 425. The lowband signal(s) is(are) routed to thelowband decoder, 433, which produces an output, 435, which again, incase of decoding of the primary bitstream only, may be of either typedescribed above (mono or stereo). The signal 435 feeds the HFR-unit,437, wherein a synthetic highband is generated, and adjusted accordingto P, which also is connected to the HFR-unit. The decoded lowband iscombined with the highband in the HFR-unit, and the lowband and/orhighband is optionally enhanced by a pseudo-stereo generator (alsosituated in the HFR-unit), before finally being fed to the systemoutputs, forming the output signal, OUT. When the secondary bitstream,417, is present, the HFR-unit also gets the B signal as an input signal,425, and 435 is in stereo, whereby the system produces a full stereooutput signal, and pseudo-stereo generators if any, are bypassed.

Stated in other words, a method for coding of stereo properties of aninput signal, includes at an encoder, the step of calculating awidth-parameter that signals a stereo-width of said input signal, and ata decoder, a step of generating a stereo output signal, using saidwidth-parameter to control a stereo-width of said output signal. Themethod further comprises at said encoder, forming a mono signal fromsaid input signal, wherein, at said decoder, said generation implies apseudo-stereo method operating on said mono signal. The method furtherimplies splitting of said mono signal into two signals as well asaddition of delayed version(s) of said mono signal to said two signals,at level(s) controlled by said width-parameter. The method furtherincludes that said delayed version(s) are high-pass filtered andprogressively attenuated at higher frequencies prior to being added tosaid two signals. The method further includes that said width-parameteris a vector, and the elements of said vector correspond to separatefrequency bands. The method further includes that if said input signalis of type dual mono, said output signal is also of type dual mono.

A method for coding of stereo properties of an input signal, includes atan encoder, calculating a balance parameter that signals astereo-balance of said input signal, and at a decoder, generate a stereooutput signal, using said balance-parameter to control a stereo-balanceof said output signal.

In this method, at said encoder, a mono signal from said input signal isformed, and at said decoder, said generation implies splitting of saidmono signal into two signals, and said control implies adjustment oflevels of said two signals. The method further includes that a power foreach channel of said input signal is calculated, and saidbalance-parameter is calculated from a quotient between said powers. Themethod further includes that said powers and said balance-parameter arevectors where every element corresponds to a specific frequency band.The method further includes that at said decoder it is interpolatedbetween two in time consecutive values of said balance-parameters in away that the momentary value of the corresponding power of said monosignal controls how steep the momentary interpolation should be. Themethod further includes that said interpolation method is performed onbalance values represented as logarithmic values. The method furtherincludes that said values of balance parameters are limited to a rangebetween a previous balance value, and a balance value extracted fromother balance values by a median filter or other filter process, wheresaid range can be further extended by moving the borders of said rangeby a certain factor. The method further includes that said method ofextracting limiting borders for balance values, is, for a multi bandsystem, frequency dependent. The method further includes that anadditional level-parameter is calculated as a vector sum of said powersand sent to said decoder, thereby providing said decoder arepresentation of a spectral envelope of said input signal. The methodfurther includes that said level-parameter and said balance-parameteradaptively are replaced by said powers. The method further includes thatsaid spectral envelope is used to control a HFR-process in a decoder.The method further includes that said level-parameter is fed into aprimary bitstream of a scalable HFR-based stereo codec, and saidbalance-parameter is fed into a secondary bitstream of said codec. Saidmono signal and said width-parameter are fed into said primarybitstream. Furthermore, said width-parameters are processed by afunction that gives smaller values for a balance value that correspondsto a balance position further from the center position. The methodfurther includes that a quantization of said balance-parameter employssmaller quantization steps around a center position and larger stepstowards outer positions. The method further includes that saidwidth-parameters and said balance-parameters are quantized using aquantization method in terms of resolution and range which, for amultiband system, is frequency dependent. The method further includesthat said balance parameter adaptively is delta-coded either in time orin frequency. The method further includes that said input signal ispassed though a Hilbert transformer prior to forming said mono signal.

An apparatus for parametric stereo coding, includes, at an encoder,means for calculation of a width-parameter that signals a stereo-widthof an input signal, and means for forming a mono signal from said inputsignal, and, at a decoder, means for generating a stereo output signalfrom said mono signal, using said width-parameter to control astereo-width of said output signal.

1. A decoder for decoding a bit stream comprising an encoded lowband signal and an encoded power spectral envelope of a stereo or multichannel signal having a first channel and a second channel, the first channel and the second channel having a set of frequency bands, the decoder comprising one or more processing elements that: receive the encoded lowband signal included in the bitstream; produce a lowband output signal, the lowband output signal having a lowband stereo signal; receive the encoded power spectral envelope of the stereo signal or the multichannel signal having the first channel and the second channel, wherein the encoded power spectral envelope comprises Huffman coded balance and level parameters for the set of frequency bands; Huffman decoding the Huffman coded balance and level parameters to obtain decoded balance and level parameters, wherein the decoded level parameters represent a total power of the first channel and the second channel for the set of frequency bands; convert the decoded balance and level parameters into power values of the first channel and the second channel; generate a synthetic highband signal using the lowband output signal; adjust a spectral envelope of the synthetic highband signal using the power values of the first channel and the second channel to obtain an adjusted highband signal; and combine the adjusted highband signal and the lowband output signal to obtain a decoded stereo signal or a decoded multichannel signal.
 2. A method of decoding a bit stream comprising an encoded lowband signal and an encoded power spectral envelope of a stereo or multichannel signal having a first channel and a second channel, the first channel and the second channel having a set of frequency bands, the method comprising: receiving the encoded lowband signal included in the bit stream; producing a lowband output signal, the lowband output signal having a lowband stereo signal; receiving the encoded power spectral envelope of the stereo signal or the multichannel signal having the first channel and the second channel, wherein the encoded power spectral envelope comprises Huffman coded balance and level parameters for the set of frequency bands; Huffman decoding the Huffman coded balance and level parameters to obtain decoded balance and level parameters for the set of frequency bands, wherein the decoded level parameters represent a total power of the first channel and the second channel for the set of frequency bands; converting the decoded balance and level parameters into power values of the first channel and the second channel; generating a synthetic highband signal using the lowband output signal; adjusting a spectral envelope of the synthetic highband signal using the power values of the first channel and the second channel to obtain an adjusted highband signal; and combining the adjusted highband signal and the lowband output signal to obtain a decoded stereo signal or a decoded multichannel signal.
 3. A non-transitory storage medium having stored thereon a computer program for performing a method of decoding a bit stream comprising an encoded lowband signal and an encoded power spectral envelope of a stereo or multichannel signal having a first channel and a second channel, the first channel and the second channel having a set of frequency bands, the method comprising: receiving the encoded lowband signal included in the bit stream; producing a lowband output signal, the lowband output signal having a lowband stereo signal; receiving the encoded power spectral envelope of the stereo signal or the multichannel signal having the first channel and the second channel, wherein the encoded power spectral envelope comprises Huffman coded balance and level parameters for the set of frequency bands Huffman decoding the Huffman coded balance and level parameters to obtain decoded balance and level parameters for the set of frequency bands, wherein the decoded level parameters represent a total power of the first channel and the second channel for the set of frequency bands; converting the decoded balance and level parameters into power values of the first channel and the second channel; generating a synthetic highband signal using the lowband output signal; adjusting a spectral envelope of the synthetic highband signal using the power values of the first channel and the second channel to obtain an adjusted highband signal; and combining the adjusted highband signal and the lowband output signal to obtain a decoded stereo signal or a decoded multichannel signal. 